PyDigger - unearthing stuff about Python


NameVersionSummarydate
kreuzberg 3.15.0 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-09-14 18:14:57
pdf2markdown 0.3.0 Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-structured Markdown 2025-09-14 02:02:58
qdrant-loader 0.7.3 A tool for collecting and vectorizing technical content from multiple sources and storing it in a QDrant vector database. 2025-09-11 07:33:39
bank-statement-separator 0.3.0 AI-powered tool for separating multi-statement PDF files using LangChain and LangGraph 2025-09-10 14:48:39
docstrange 1.1.6 Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, JSON, CSV, HTML) with intelligent content extraction and advanced OCR. 2025-09-10 09:27:30
docling-onnx-models 0.1.3 ONNX Runtime implementations for Docling AI models 2025-09-09 08:45:47
mseep-kreuzberg 3.13.4 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-09-09 03:44:56
pydatamax 0.2.0 Advanced Data Crawling and Processing Framework 2025-09-03 17:39:42
docuglean-ocr 1.0.0 An SDK for intelligent document processing using SOTA VLLM models 2025-09-02 13:19:12
contextgem 0.18.0 Effortless LLM extraction from documents 2025-09-01 21:07:54
docx-mcp 0.1.4 DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作 2025-08-31 18:11:33
dddocr-py 0.1.0 Python client for the 3DOCR.com OCR API 2025-08-30 19:16:43
wizarddocx 1.0.0 Text extraction from Microsoft Word files. Parses Word documents natively and can optionally run local OCR with Tesseract for embedded images or scanned pages. Supports page selection and bytes input. Legacy .doc is read-only and OCR is not available. 2025-08-28 09:27:49
mcp-gosling 0.1.0 MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library 2025-08-25 02:12:32
smartloop 1.3.2 Smartloop Command Line interface to process documents using LLM 2025-08-24 17:55:11
ocr-detection 0.4.1 A Python library to detect whether PDF pages contain extractable text or are scanned images requiring OCR 2025-08-22 07:27:10
qagen 0.1.1 A powerful Chinese document QA pairs generation and validation tool with multiple LLM support 2025-08-21 10:17:34
inkognito 0.1.0 Privacy-first document processing FastMCP server with PII anonymization 2025-08-13 17:45:52
xml-analysis-framework 1.4.4 XML document analysis and preprocessing framework designed for AI/ML data pipelines 2025-08-12 04:21:41
raggy 0.3.5 scraping stuff 2025-08-11 14:49:05
hourdayweektotal
636567638322539
Elapsed time: 7.27975s